Super-Fast XML Wrapper Generation in DB2: A Demonstration

نویسندگان

  • Vanja Josifovski
  • Sabine Maßmann
  • Felix Naumann
چکیده

The XML Wrapper is a new feature of the federated database capabilities of DB2/UDB v8. It enables users and applications to issue SQL queries against XML data from a variety of sources, including files and web services. The XML Wrapper assumes hierarchical XML documents modeled as families of virtual relational tables in a federated schema, which can then be queried to extract information from the XML and combine it with data from other sources. Due to the nature of the problem, using the XML Wrapper is complex and several difficult steps must be undertaken: (i) The hierarchical schema of the source must be flattened to a relational form. (ii) Each relation of the flattened schema must be registered in DB2 as a NICKNAME – a complex virtual table definition containing several XPaths as specialized options. (iii) Each NICKNAME must be accompanied by a VIEW – again a complex structure involving join conditions. Chocolate is a tool that alleviates all three tasks: Chocolate provides several flattening strategies and an interface allowing users to modify the automatically generated target schema. Once the user is satisfied with the schema, Chocolate automatically generates the corresponding NICKNAME and VIEW definitions.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Data Extraction using Content-Based Handles

In this paper, we present an approach and a visual tool, called HWrap (Handle Based Wrapper), for creating web wrappers to extract data records from web pages. In our approach, we mainly rely on the visible page content to identify data regions on a web page. In our extraction algorithm, we inspired by the way a human user scans the page content for specific data. In particular, we use text fea...

متن کامل

XML and DB2

The eXtensible Markup Language (XML) is a key technology that facilitates both information exchange and e-business transactions. Starting with DB2 UDB Net.Data V1, an application can generate XML documents from SQL queries against DB2 or any ODBC compliant databases. Today DB2 UDB XML Extender not only serves as a repository for both XML documents and their Document Type Definitions (DTDs), but...

متن کامل

Supporting unified interface to wrapper generator in Integrated Information Retrieval

Given the ever-increasing scale and diversity of information and applications on the Internet, improving the technology of information retrieval is an urgent research objective. Retrieved information is either semi-structured or unstructured in format and its sources are extremely heterogeneous. In consequence, the task of efficiently gathering and extracting information from documents can be b...

متن کامل

Semantic Wrappers for Semi-Structured Data Extraction1

In this paper, we propose an approach to extract information from HTML pages and to add semantic (XML) tags to them. Wrapping is an essential technique used to automatically extract information from Web sources. This paper describes both, a general approach based on rules, which can be used to automatically generate wrappers, and an assistant generator wrapper called WebMantic. We also provide ...

متن کامل

Semantic Wrappers for Semi-Structured Data Extraction

In this paper, we propose an approach to extract information from HTML pages and to add semantic (XML) tags to them. Wrapping is an essential technique used to automatically extract information from Web sources. This paper describes both, a general approach based on rules, which can be used to automatically generate wrappers, and an assistant generator wrapper called WebMantic. We also provide ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003